Frontiers in Artificial Intelligence
○ Frontiers Media SA
Preprints posted in the last 90 days, ranked by how well they match Frontiers in Artificial Intelligence's content profile, based on 11 papers previously published here. The average preprint has a 0.08% match score for this journal, so anything above that is already an above-average fit.
Durgude, A.; Soni, N.; Raghuwanshi, K. C.; Awasthi, S.; Uniyal, K.; Yadav, S.; Kakani, A.; Kesharwani, P.; Mago, V.; Vathulaya, M.; Rao, N.; Chattopadhyay, D.; Kapoor, A.; Bhimsaria, D.
Show abstract
Burn injuries are a significant concern in developing countries due to limited infrastructure, and treating them remains a major challenge. The manual assessment of burn severity is subjective and depends, to a large extent, on individual expertise. Artificial intelligence can automate this task with greater accuracy and improved predictions, which can assist healthcare professionals in making more informed decisions while triaging burn injuries. This study established a model pipeline for detecting burn injuries in images using multiple deep learning models, including U-Net, DenseNet, ResNet, VGG, EfficientNet, and transfer learning with the Segment Anything Model2 (SAM2). The problem statement was divided into two stages: 1) removing the background and 2) burn skin segmentation. ResNet50, used as an encoder with a U-Net decoder, performs better for the background removal task, achieving an accuracy of 0.9757 and an intersection over union (Jaccard index) of 0.9480. DenseNet169, used as an encoder with a U-Net decoder, performs well in burn skin segmentation, achieving an accuracy of 0.9662 and an intersection over Union of 0.8504. The dataset collected during the project is available for download to facilitate further research and advancements (Link to dataset: https://geninfo.iitr.ac.in/projects). TBSA was estimated from predicted burn masks using scale-based calibration
Ovcharuk, O. V.; Mazurets, O.; Molchanova, M. V.; Kirpich, A.; Skums, P.; Sobko, O. V.; Barmak, O.; Krak, I.; Yakovlev, S.
Show abstract
This study introduces a novel transformer-based ensemble framework for the multi-label detection of mental health disorders from social media posts. Unlike traditional multi-class approaches that often struggle with comorbidity, the proposed method employs a binary relevance strategy using fine-tuned DistilBERT models to identify co-occurring conditions, including depression, anxiety, and narcissistic personality disorder. To address class imbalance and optimize decision boundaries, the framework integrates a composite loss function (focal, dice, and log loss) and utilizes Youdens J statistic for threshold calibration. Validation on textual datasets demonstrates the efficacy of this approach, with an overall F1-score of 0.930 and AUC values exceeding 0.89. Comparative analysis suggests that decomposing complex diagnostic tasks into independent binary problems significantly reduces inter-class confusion relative to standard multi-class baselines. Furthermore, a qualitative error analysis highlights specific linguistic challenges, such as contextual polarity shifting, metaphorical ambiguity, and colloquial usage, that impact model specificity. The findings demonstrate the potential of the proposed framework as a robust screening tool for online mental health monitoring, while underscoring the necessity of human oversight to mitigate linguistic misinterpretations. Author summaryMental health disorders such as depression, anxiety, and narcissistic personality disorder represent a major global health challenge. This work proposes a method that employs transformer-based deep learning models to analyze social media posts for mental health assessment. A significant hurdle in automated diagnosis is that these conditions often occur together (comorbidity), whereas many existing Artificial Intelligence (AI) systems are designed to detect only a single disorder at a time. This study proposes a solution using a "multi-label" deep learning framework. Rather than relying on a single multi-class classifier, the approach utilizes an ensemble of specialized binary models, each trained to detect indicators of a specific disorder. This design reduces classification confusion between clinically similar conditions, such as depression and anxiety. The method was evaluated on publicly available datasets, had an F1-score of 0.930 which outperformed the existing approaches. The presented approach demonstrated high effectiveness, achieving better separation between clinically similar disorders compared to traditional methods. Crucially, the detailed investigation beyond the standard statistical metrics was performed which looked into specific models mistakes. It was found that, while the presented AI model is highly sensitive, it can be confused by the specifics of the language such as metaphors (e.g., "feeling like a pressure cooker"), negations (e.g., "I am not worried"), and the colloquial clinical terms. These results highlight that AI is a powerful tool which can be used for early screening and continuous monitoring on social media, while it still requires careful calibration and human oversight to distinguish between genuine symptoms and everyday emotional expression. The findings demonstrate that analyzing social media texts with advanced machine learning techniques can serve as a powerful complementary tool to clinical diagnostics. While not intended to completely replace professional evaluation, the proposed approach can help identify potential risks, promote earlier detection of mental health disorders, support preventive interventions, and ultimately improve access to care.
Roesler, M. W.; Wells, C.; Schamberg, G.; Gao, J.; Harrison, E.; O'Grady, G.; Varghese, C.
Show abstract
BackgroundPredictive models employing machine learning algorithms are increasingly being used in clinical decision making, and improperly calibrated models can result in systematic harm. We sought to investigate the impact of class imbalance correction, a commonly applied preprocessing step in machine learning model development, on calibration and modelled clinical decision making in a large real-world context. MethodsA histogram boosted gradient classifier was trained on a highly imbalanced national dataset of >1.8 million patients undergoing surgery, to predict the risk of 90-day mortality and complications after surgery. Class imbalance correction strategies including random oversampling, synthetic minority oversampling technique, random under-sampling, and cost-sensitive learning were compared to the natural distribution ( natural). Models were tested and compared with classification metrics, calibration plots, decision curve analysis, and simulated clinical impact analysis. ResultsThe natural model demonstrated high performance (AUROC 0.94, 95% CI 0.94-0.95 for mortality; 0.84, 95% CI 0.84-0.85 for complications) and calibration (log loss 0.05, 95% CI 0.04-0.05 for mortality; 0.23, 95% CI 0.23-0.24 for complications). Class imbalance mitigation (CSL, ROS, RUS, and SMOTE) did not improve AUROC or AUPRC but increased recall and F1 scores at the expense of precision and accuracy. However, these methods severely compromised model calibration, leading to significant over-prediction of risks (up to a 62.8 % increase) as further evidenced by increased log loss across all mitigation techniques. Decision curve analysis and clinical scenario testing confirmed that the natural model provided the highest net benefit. ConclusionClass imbalance correction methods result in significant miscalibration, leading to possible harm when used for clinical decision making.
Hong, S.; Mun, Y.; Lee, K. H.; Hahn, S.; Jang, S.; Kim, C.; Chung, K. S.; Sim, T.
Show abstract
BackgroundIn-hospital cardiac arrest on general wards is often preceded by detectable physiological deterioration, yet conventional early warning scores demonstrate limited discrimination. We developed and performed preliminary validation of a transformer-based cardiac arrest prediction system for general ward patients. MethodsThis retrospective study was conducted among general ward patients at a tertiary academic hospital in South Korea (Severance Hospital, 2013-2017). We developed Cardiac Arrest Risk Early Detection (CARED), a transformer-based system to predict 24-hour cardiac arrest risk. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC). Internal validation was performed using 5-fold cross-validation. ResultsIn internal validation, CARED achieved the highest discrimination with an AUROC of 0.939 (95% CI: 0.928-0.950), significantly outperforming machine learning and deep learning baseline models (all P < 0.05) ConclusionsCARED demonstrated high discriminative ability for predicting cardiac arrest in general ward patients. Comprehensive validations including multidimensional performance assessments, comparisons with conventional early warning scores, and external validation in independent populations are ongoing and will be reported in subsequent publications.
Ray, P.
Show abstract
Thyroid carcinoma is one of the most prevalent endocrine malignancies worldwide, and accurate preoperative differentiation between benign and malignant thyroid nodules remains clinically challenging. Diagnostic methods that medical practitioners use at present depend on their personal judgment to evaluate both imaging results and separate clinical tests, which creates inconsistency that leads to incorrect medical evaluations. The combination of radiological imaging with clinical information systems enables healthcare providers to enhance their capacity to make reliable predictions about patient outcomes while improving their decision-making abilities. The study introduces a deep learning framework that utilizes multiple data sources by combining magnetic resonance imaging (MRI) data with clinical text to predict thyroid cancer. The system uses a Vision Transformer (ViT) to obtain advanced MRI scan features, while a domain-adapted language model processes clinical documents that contain patient medical history and symptoms and laboratory results. The cross-modal attention system enables the system to merge imaging data with textual information from different sources, which helps to identify how the two types of data are interconnected. The system uses a classification layer to classify the fused features, which allows it to determine the probability of cancerous tumors. The experimental results show that the proposed multimodal system achieves better results than the unimodal base systems because it has higher accuracy, sensitivity, specificity, and AUC values, which help medical personnel to make better preoperative decisions.
Jaakkola, M.; Karpijoki, H.; Saari, T.; Rainio, O.; Li, A.; Knuuti, J.; Virtanen, K.; Klen, R.
Show abstract
BackgroundSegmentation is a routine, yet time-consuming and subjective step in the analysis of positron emission tomography (PET) images. Automatic methods to do it have been suggested, but recent method development has focused on supervised approaches. The previously published unsupervised segmentation methods for PET images are outdated for the arising dynamic human total-body PET images now enabled by the evolving scanner technology. MethodsIn this study, we introduce an unsupervised general purpose automatic segmentation method for modern PET images consisting of tens of millions of voxels. We provide its implementation in an easy-to-use format and demonstrate its performance on two datasets of real human total-body images scanned using different radiotracers. Results and conclusionsOur results show that the suggested method can identify functionally distinct areas within the anatomical organs. Combined with anatomical segments obtained from other imaging modalities, this enables great potential to improve clinically meaningful segmentation and reduce time-consuming manual work.
Pham, T. D.
Show abstract
ObjectiveThis study investigates whether incorporating physiological coupling concepts into neural network design can support stable and interpretable feature learning for histopathological image classification under limited data conditions. MethodsA physiologically inspired architecture, termed CardioPulmoNet, is introduced to model interacting feature streams analogous to pulmonary ventilation and cardiac perfusion. Local and global tissue features are integrated through bidirectional multi-head attention, while a homeostatic regularization term encourages balanced information exchange between streams. The model was evaluated on three histopathological datasets involving oral squamous cell carcinoma, oral submucous fibrosis, and heart failure. In addition to end-to-end training, learned representations were assessed using linear support vector machines to examine feature separability. ResultsCardioPulmoNet achieved performance comparable to several pretrained convolutional neural networks across the evaluated datasets. When combined with a linear classifier, improved classification performance and higher area under the receiver operating characteristic curve were observed, suggesting that the learned feature embeddings are well structured for downstream discrimination. ConclusionThese results indicate that physiologically motivated architectural constraints may contribute to stable and discriminative representation learning in computational pathology, particularly when training data are limited. The proposed framework provides a step toward integrating physiological modeling principles into medical image analysis and may support future development of transferable and interpretable learning systems for histopathological diagnosis.
Hameed, R.; Haider Warraich, S.; Bhatti, A. H.
Show abstract
BackgroundCardiovascular disease (CVD) remains the leading cause of mortality globally, with many events occurring in individuals without prior diagnosed conditions. Early risk stratification using accessible biomarkers could enable timely intervention and reduce adverse outcomes. ObjectiveTo evaluate the performance of a machine learning-based risk prediction model utilizing routine physiological parameters for early detection of cardiovascular disease risk 4-6 weeks prior to clinical manifestation. MethodsA prospective cohort study was conducted from January 2024 to January 2025 involving 500 employees (300 males, 200 females; age range 35-50 years) recruited through ProMed Solutions Pvt. Ltd., Pakistan. Participants with no prior diagnosed cardiac conditions underwent weekly screenings measuring body mass index (BMI), blood pressure (systolic and diastolic), heart rate, single-lead electrocardiogram (ECG), and random blood glucose. A supervised machine learning algorithm generated cardiovascular risk scores. Monthly comprehensive cardiac evaluations including complete blood work, 12-lead ECG, echocardiography, and ultrasound imaging were performed by PAF Hospital, Islamabad, serving as clinical validation endpoints. ResultsOver 26,000 individual screening sessions were completed with 98.4% adherence. The ML model achieved 96.0% overall accuracy (480/500), 71.05% sensitivity (27/38 true positives), and 98.05% specificity (453/462 true negatives). The model correctly identified 27 of 38 individuals who developed early-stage CVD during the study period (true positives), with 11 false negatives. Among 462 individuals without CVD development, 453 were correctly classified (true negatives) with 9 false positives. Positive predictive value was 75.0% (27/36) and negative predictive value was 97.6% (453/464). Male participants with BMI 28-30 kg/m{superscript 2}, pulse pressure 60-74 mmHg, ECG showing ventricular ectopy or ST-segment abnormalities, and random glucose 156-164 mg/dL demonstrated 81.5% probability of early-stage CVD detection confirmed through comprehensive clinical investigation. ConclusionsIntegration of routine physiological parameters with machine learning algorithms demonstrates high specificity and acceptable sensitivity for early cardiovascular risk detection in asymptomatic working-age adults. The models high negative predictive value suggests utility for population-level screening, though modest sensitivity indicates complementary clinical assessment remains essential for comprehensive risk stratification.
Mensah, S.; Atsu, E. K. A.; Ammah, P. N. T.
Show abstract
Brain tumors are one of the most life-threatening diseases, requiring precise and timely detection for effective treatment. Traditional methods for brain tumor detection rely heavily on manual analysis of MRI scans, which is time-consuming, subjective, and prone to human error. With advancements in deep learning, Convolutional Neural Networks (CNNs) have become popular for medical image analysis. However, CNNs are limited in their ability to capture spatial hierarchies and pose variations, which reduces their accuracy, particularly for tasks like brain tumor segmentation where precise spatial relationships are crucial. This research introduces a hybrid Capsule Neural Network (CapsNet) and ResNet50 model designed to overcome the limitations of traditional CNNs by capturing both spatial and pose information in MRI scans. The proposed model leverages ResNet50 for feature extraction and CapsNet for handling spatial relationships, leading to more accurate segmentation. The study evaluates the model on the BraTS2020 dataset and compares its performance to state-of-the-art CNN architectures, including U-Net and pure CNN models. The hybrid model, featuring a custom 5-cycle dynamic routing algorithm to enhance capsule agreement for tumor boundaries, achieved 98% accuracy and an F1-score of 0.87, demonstrating superior performance in detecting and segmenting brain tumors. This study pioneers the systematic evaluation of the ResNet50 + CapsNet hybrid on the BraTS2020 dataset, with a tailored class weighting scheme addressing class imbalance, improving effectiveness in identifying irregularly shaped tumors and smaller regions in identifying irregularly shaped tumors and smaller tumor regions. The study offers a robust solution for automating brain tumor detection. Future work will explore the use of Capsule Networks alone for brain tumor detection in MRI data and investigate alternative Capsule Network architectures, as well as their integration into clinical decision support systems.
Pradhan, A. M.; Shetty, V. A.; Gregor, C.; Graham, J. H.; Tusing, L.; Hirsch, A. G.; Hall, E.; Troiani, V.; Davis, M. P.; Bieler, D. L.; Romagnoli, K. M.; Kraus, C. K.; Piper, B. J.; Wright, E. A.
Show abstract
IntroductionRecreational and medical cannabis use (CU) information is often available within the electronic health record (EHR) in a format that is impractical for health care provider use. Transformation of free-text EHR documentation in notes to discrete elements is possible using natural language processing (NLP) and has the potential to characterize CU efficiently. The objective of this study was to develop an NLP algorithm to identify documentation of CU within EHR unstructured clinical notes. MethodsWe identified EHR notes with cannabis-related terminologies through a keyword search among all Geisinger patients with at least one encounter between 1/1/2013 and 6/30/2022. We trained four NLP models to classify notes into six categories based on time, context, and reliability of CU documentation identified through manual annotation. We compared the demographic characteristics of patients with positive classification for CU using the best-performing model to those of the overall population. ResultsOf the over 1.7 million eligible patients, 150,726 (8.6%) were flagged as cannabis users. The Bio-ClinicalBERT, a transformer-based NLP model, achieved close to human performance in classifying CU (weighted Precision=91.4, Recall=93.3, F-score=92.4). Cannabis users had higher BMI and were at least nine-fold more likely to use tobacco, alcohol, and illicit substances. ConclusionOur study evaluated the prevalence of CU documentation across the entire corpus of EHR notes data without population segmentation. The NLP methodologies used achieved performance close to that of human annotation and laid the foundation for identifying and classifying CU within unstructured data sources, with future applications in research and patient care. Plain Language SummaryMarijuana, also known as cannabis, may impact the health of patients, yet it is not routinely captured in medical records, and when documented, it is often found in unstructured formats (e.g., progress notes) rather than in discrete fields. Incomplete and unstructured capture limits many functional capabilities within the EHR that enhance patient care (e.g., drug interactions, notifications) and limit researchers from identifying patients routinely exposed to marijuana use. The transformation of free-text documentation of cannabis use (CU) into discrete elements can be performed using natural language processing (NLP). The objective of this study was to develop an NLP model to identify CU in unstructured clinical notes in the EHR. We examined the EHRs of Geisinger patients in Pennsylvania over a 10-year period. Among 1.7 million patients, 9% were identified as CU. One of the NLP models tested, Bio-ClinicalBERT, achieved the highest performance. Cannabis users had a higher BMI and were ten-fold more likely to be tobacco users, ten-fold more likely to use alcohol, and nine-fold more likely to use illicit substances. NLP can be used to better understand the risks and benefits of CU at a population level and may improve patient identification to assist clinical decision-making. Future CU epidemiological research should continue to explore other avenues to automate and improve CU documentation by leveraging rapidly evolving technologies, such as artificial intelligence-driven tools.
Kumar, S. N.; K S, G.; Chinnakanu, S. J.; Krishnan, H.; M, N.; Subramaniam, S.
Show abstract
Non-alcoholic fatty liver disease (NAFLD) is a globally prevalent hepatic condition caused by the buildup of fat in the liver. It is frequently associated with metabolic comorbidities such as hypertension, cardiovascular disease (CVD), and prediabetes. However, early detection remains challenging due to the asymptomatic progression, and existing primary diagnostic methods, such as imaging or liver biopsy, are often expensive and inaccessible in rural areas. This study proposes a two-stage, interpretable machine learning pipeline for the non-invasive and cost-effective prediction of NAFLD and its key comorbidities using routine clinical parameters. The NAFLD prediction model was developed using the XGBoost algorithm, trained on a hybrid dataset that combines real patient data with rule-based synthetic data generated by simulating clinically plausible cases. Upon NAFLD-positive prediction, three separate XGB models, trained on data labelled based on thresholds, assess individual risks for hypertension, cardiovascular disease, and prediabetes. Explainability is obtained using SHAP (SHapley Additive exPlanations), which provides insight into feature relevance, while biomarker radar plots help in the visual interpretation of comorbidities. A user-friendly Streamlit interface enables real-time interaction with the tool for potential clinical application. The NAFLD model demonstrated robust performance, while the models used for predicting comorbidities achieved perfect performance, which may be a reflection of the limited dataset size used in the second stage. This work underscores the potential of AI-driven tools in NAFLD diagnosis, particularly when combined with explainable AI methods.
Chen, H.; Ye, J.
Show abstract
ObjectiveTo develop and validate machine learning models for predicting Blood Pressure (BP) control status using demographic characteristics and longitudinal BP history. MethodsThis retrospective cohort study analyzed deidentified data from a multi-site primary care quality improvement program for hypertension management. Participants included adults with diagnosed hypertension (N=23,002) or heart failure (N=1,137) who had at least 2 clinical visits. The primary outcome was BP control status, defined as systolic BP less than 140 mm Hg and diastolic BP less than 90 mm Hg. Five machine learning algorithms (logistic regression, decision tree, Random Forest (RF), support vector machine, and extreme gradient boosting) were compared using accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUROC). Feature importance was assessed using Shapley Additive Explanations (SHAP) values. ResultsAmong 23,002 hypertensive patients (mean [SD] age, 65.25 [13.95] years; 13,015 [56.58%] female), the RF model achieved the highest performance with an AUROC of 0.88 (95% CI, 0.86-0.90) using BP history features alone and 0.87 (95% CI, 0.85-0.89) with combined features. BP history substantially outperformed demographic factors (AUROC, 0.60; 95% CI, 0.58-0.62). Mean systolic BP, maximum systolic BP, and maximum diastolic BP were the most influential predictors. In the heart failure cohort (N=1,137; mean [SD] age, 75.15 [15.05] years; 579 [50.92%] female), the RF model achieved an AUROC of 0.93 (95% CI, 0.90-0.96) with combined features. The model demonstrated accuracy of 0.77 (95% CI, 0.76-0.78), precision of 0.78 (95% CI, 0.76-0.79), recall of 0.73 (95% CI, 0.71-0.75), and F1-score of 0.75 (95% CI, 0.74-0.77) for the hypertension cohort. ConclusionsMachine learning models incorporating longitudinal BP history effectively predicted hypertension control status, with BP variability metrics showing substantially greater predictive value than demographic characteristics. These findings suggest that systematic incorporation of historical BP patterns into clinical decision-support systems may enhance personalized hypertension management.
Shishir, F. S. S.; Harvey, C. J.; Gupta, A.; Noheria, A.; Shomaji, S.
Show abstract
Electrocardiogram (ECG) is a widely available, non-invasive diagnostic tool used for cardiovascular screening and provides essential insights into heart rhythm, structure, and function. However, the high dimensionality of ECGs and their entanglement with demographic information pose challenges for building fair and privacy-preserving machine learning models. ECG signals inherently encode soft biometric attributes such as sex, age, and race, which may introduce bias and raise privacy concerns in data-sharing environments. To address these challenges, we propose a deep learning framework that learns clinically relevant ECG representations while suppressing sensitive demographic information. We leverage a variational autoencoder (VAE) with a dual-discriminator architecture. One adversarial branch reduces soft biometric encoding, while the other preserves clinically important discrimination of reduced left ventricular ejection fraction (LVEF). The privacy-preserving reconstructed ECGs reduced identifiability of soft biometrics by independent CNN models with AUROC for sex 0.59 (from original 0.79), age 0.63 (from 0.78), and race 0.57 (from 0.69), while retaining clinically-useful predictions like reduced LVEF 0.82 (from 0.86), left ventricular hypertrophy 0.72 (from 0.75), and 5-year mortality 0.67 (from 0.66). These findings demonstrate the effectiveness of our approach for retaining ECG data yet protecting patient privacy.
Pemmasani, S. K.; Athmakuri, S.; R G, S.; Acharya, A.
Show abstract
Neurological health score (NHS), indicating the health of brain and nervous system, helps in identifying high risk individuals, and in recommending lifestyle modifications. In the present study, we developed NHS based on genetic, lifestyle and biochemical variables associated with eight neurological disorders - dementia, stroke, Parkinsons disease, amyotrophic lateral sclerosis, schizophrenia, bipolar disorder, multiple sclerosis and migraine. UK Biobank data from Caucasian individuals was used to develop the model, and the data from individuals of Indian ethnicity was used to validate the model. Logistic regression and XGBoost algorithms were used in selecting the significant variables for the disorders. NHS developed from the selected variables was found to be very significant after adjusting for age and sex (AUC:0.6, OR: 0.95). Higher NHS was associated with a lower risk of neurological disorders and better social well-being. Highest NHS group (top 25%) showed 1.3 times lower risk compared to the rest of the individuals. Results of our study help in developing a framework for quantifying the neurological health in clinical setting.
Addepalli, V. r.; Abdalnabi, N.; Kummerfeld, E.; Hembroff, G.; Kiselica, A. M.; Rao, P.; Lee, K.
Show abstract
Alzheimers disease (AD) is increasing in prevalence, and early detection is essential for timely care. Clinical services face growing demand, leading to delays in diagnostic appointments and increasing the risk of disease progression before evaluation. This work examines artificial intelligence (AI) methods for assessing cognitive status from linguistic features. The proposed architecture uses small language models (SLMs) to analyze speech patterns, and its compact design allows deployment on mobile devices. Recent reasoning-focused models, including Deepseek-R1 and Llama, were evaluated for dementia classification. Multiple fine-tuning strategies were compared, and the best model achieved 91% accuracy and an F1 score. The findings show that AI systems built on SLMs can achieve performance comparable to large language models, indicating their potential as efficient tools that may support health care providers through accessible pre-clinical screening for AD.
Chae, R.; Zhou, J.; Chou, O. H. I.; Yang, B.; Pu, H.; Tse, G.; Cheung, B. M. Y.; Zhu, T.; Car, J.; Lu, L.
Show abstract
Heart failure (HF) is one of the major causes of morbidity and mortality globally, necessitating accurate tools for health outcome prediction and risk stratification. In this study, we propose an interpretable multimodal machine learning framework integrating four clinical data modalities (i.e., demographics, medications, laboratory tests, and electrocardiograms [ECGs]) to predict 30-day all-cause mortality and hospital readmission in HF patients. Using clinical data from 2,868 HF patients across 43 local hospitals in Hong Kong, we trained and evaluated ten machine learning models for HF risk prediction, with the best performing model achieving an area under the receiver operating characteristic curve (AUC) of 0.881 for mortality and 0.709 for readmission. Notably, laboratory tests and ECG features dominate predictive power, and their combination alone yielded near-optimal results (AUC: 0.872), suggesting that these two modalities may be adequate for effective risk prediction in resource-constrained settings. The SHapley Additive exPlanations (SHAP) analysis identified serum albumin, high-sensitivity troponin I, lactate dehydrogenase, and QT interval dispersion as key predictors. Feature redundancy analysis further revealed strong correlations within laboratory tests and ECG features, suggesting opportunities for model simplification. To the best of our knowledge, this is the first study that comprehensively evaluates diverse configurations of four data modalities for HF risk prediction through ablation analysis, quantifying the marginal gains of each data modality and their combinations. Our findings demonstrate that interpretable multimodal machine learning model can enhance risk prediction in HF patients, supporting personalized management and scalable deployment across diverse healthcare settings.
Bourriez, N.; Mahanta, S. K.; Svatko, I.; Lacassagne, E.; Atchade, A.; Leonardi, F.; Massougbodji, A.; Cohen, E.; Argy, N.; Cottrell, G.; Genovesio, A.
Show abstract
Malaria affects almost 263 million people worldwide, most of whom live in sub-Saharan countries. In a strategy to reduce malaria-related mortality and limit transmission, diagnosis in endemic areas needs to be immediately available on the field, easy to perform and cheap. Therefore, it currently heavily relies on microscopic examination of blood smears. However, several studies comparing the sensitivity of this approach with qPCR, considered as the most sensitive method albeit not available on the field, found that up to half of the infected population failed to be detected by microscopy alone because no visible parasites could be found in blood smears. These so-called submicroscopic infections pose a diagnostic challenge, yet represent a huge reservoir for malaria transmission. In this study, we hypothesized that qPCR results could be predicted by deep learning from subtle cell signals present in thin blood smear images, even in the absence of visible parasites, making a sensitive diagnostic directly available on the field using a microscope and a smartphone. To test this hypothesis, we acquired a large smartphone-based blood smear images dataset from samples tested both for microscopy and qPCR. We then focused exclusively on these "negative" slides from the microscopic diagnostic point of view, among which half were qPCR positive. A range of standard deep learning models were evaluated to best predict the qPCR result from these microscopy images, using various backbones along with various aggregation functions at the slide level, from a simple vote to Multiple Instance Learning with attention. Our results show that the qPCR results can be predicted from parasite free blood smear images with 62.00% ({+/-}2.5 on 4-folds) accuracy and reaching 67.2 % ({+/-}9.6 on 4-folds) in sensitivity. We then used generative models to investigate the subtle morphological variations occurring in red blood cells that may contribute to predicting malaria infection in the absence of parasites. Leveraging thin blood smear and portable deep learning, we established the first proof of concept that the qPCR sensitivity can be approached through the detection of submicroscopic infections directly on the field without additional infrastructure and thus could significantly improve malaria surveillance and elimination efforts.
Naqvi, S. A. R.; Ahmed, S. B.
Show abstract
Glaucoma is a leading cause of irreversible blindness and requires early detection to prevent vision loss. This study proposes a novel framework for automated glaucoma detection using fundus images, integrating deep learning and explainable artificial intelligence (XAI). By unifying five public datasets (RIM-ONE, ACRIMA, DRISHTI-GS, REFUGE, and EyePACS), we have created a diverse dataset to enhance model generalizability. An ensemble of five deep learning models, three convolutional neural networks (ResNet50, EfficientNet-B0, DenseNet121) and two transformer-based models (Vision Transformer, Swin Transformer) are trained for robust classification. Grad-CAM and attention rollout visualizations provided insight into model decision making, highlighting critical regions such as the optic disc and cup. These visualizations, combined with ensemble predictions, were processed by Google Gemini 1.5 Flash to generate clinician-style diagnostic reports. The ensemble model has achieved a test accuracy of 95.38% and an AUC of 0.99, outperforming individual models. This framework improves diagnostic accuracy and interpretability, bridging the gap between AI predictions and clinical utility, with potential for future integration into real-world ophthalmic workflows.
Liu, R.; Azzam, M.; Zabik, N.; Wan, S.; Blackford, J.; Wang, J.
Show abstract
In 2024, approximately 30% of U.S. adolescents reported having consumed alcohol at least once in their lifetime, with about 25% of these individuals engaging in binge drinking. Adolescent alcohol use is associated with neurodevelopmental impairments, elevated risk of later alcohol use, and mental health disorders. These findings underscore the importance of identifying the variables driving adolescent alcohol use and leveraging them for early identification and targeted intervention. Previous studies have typically developed machine-learning classification models that use neuroimaging data in combination with limited clinical measurements. Neuroimaging data are expensive and difficult to obtain at scale, whereas clinical measures are more practical for large-scale screening due to their low cost and widespread accessibility. However, clinical-only approaches for alcohol drinking classification remain largely underexplored. Furthermore, prior studies have often focused on adults, limiting generalizability to the broader adolescent population. Additionally, confounding factors such as age and substance use, which are strongly correlated with alcohol consumption, have often been inadequately addressed, potentially inflating classification performance. Finally, class imbalance remains a persistent challenge, with prior attempts yielding only limited improvements. To address these limitations, we propose FocalTab, a framework that integrates TabPFN with focal loss for robust generalization and effective mitigation of class imbalance. The approach also incorporates an initial preprocessing step to remove confounding factors to account for age and substance-use. We compare FocalTab against state-of-the-art methods across different variable selections and dataset settings. FocalTab achieves the highest accuracy (84.3%) and specificity (80.0%) in the most stringent setting, in which both age and substance use variables were excluded, whereas competing models drop to near-chance specificity (12-24%). We further applied SHapley Additive exPlanations (SHAP) analysis to identify key clinical predictors of drinker classification, supporting enhanced screening and early intervention.
Vanegas Mueller, E.; Harford, M.; He, L.; Banerjee, A.; Leeson, P.; Villarroel, M.
Show abstract
Sudden cardiac death risk is 2-3-fold higher in athletes than in non-athletes. We classify sports-related cardiac arrhythmias using a novel explainability framework comprising data analysis, model interpretability, post-hoc visualisation, and systematic assessment. Two neural networks--one with interpretable sinc convolution and one with standard convolution--were trained on general-population ECGs (PhysioNet, n=88,253, 30 arrhythmias, three continents) and tested on professional footballers (PF12RED, n=161) via domain adaptation for normal sinus rhythm (NSR), sinus bradycardia (SB), incomplete right bundle branch block (IRBBB), and T-wave inversion (TWI). Sinc convolution achieved superior NSR detection (AUROC 0.75 vs 0.70), whilst standard convolution excelled at SB (0.74 vs 0.73), IRBBB (0.66 vs 0.58), and TWI (0.59 vs 0.54). Gradient-weighted Class Activation Mapping revealed that sinc models focus on physiologically relevant ECG segments (the PR interval for NSR/SB and the T wave for TWI). We hypothesise that sinc convolution better captures periodic rhythms but struggles with complex morphological patterns, suggesting architectural choice should align with underlying cardiac pathophysiology. Graphical abstractAbbreviations: AI, artificial intelligence; AUPRC, area under the precision-recall curve; AUROC, area under the receiver operating characteristic curve; Conv, convolution; ECG, electrocardiogram; Grad-CAM, gradient-weighted class activation mapping; IAVB, first-degree atrioventricular block; IRBBB, incomplete right bundle branch block; LAD, left axis deviation; LBBB, left bundle branch block; LVH, left ventricular hypertrophy; NSR, normal sinus rhythm; QT, QT interval; RAD, right axis deviation; RBBB, right bundle branch block; RVH, right ventricular hypertrophy; SA, sinus arrhythmia; SB, sinus bradycardia; TWI, T-wave inversion; xAI, explainable artificial intelligence. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=123 SRC="FIGDIR/small/26346628v1_ufig1.gif" ALT="Figure 1"> View larger version (60K): org.highwire.dtl.DTLVardef@15c80a4org.highwire.dtl.DTLVardef@1c1f2org.highwire.dtl.DTLVardef@1641ee0org.highwire.dtl.DTLVardef@272fec_HPS_FORMAT_FIGEXP M_FIG C_FIG